Variable Selection and Dimension Reduction by Learning Gradients
نویسندگان
چکیده
High dimension data analysis has become a challenging problem in modern sciences. The diagnosis by gene expression or SNPs data arising in medical and biological sciences is a typical example and may be the most important focus of the research in the past decade. The number of variables in these data sets may be tens to hundreds of thousands. The understanding of the data structure and inference are much difficult because of the curse of the dimension. This has driven the rapid advances in the research of variable selection and dimension reduction techniques in machine learning and statistical communities that show great advantages. Variable selection is closely related to the relevance study and dates back at least to the middle of 1990’s (Tibshirani, 1996; Blum and Langley, 1997; Kohavi and John, 1997). Since then it has been rapidly developed, especially after the microarray data and text categorization draw attention of researchers. A special issue on this topic is published by Journal of Machine Learning Research in 2003. In Guyon and Elisseeff (2003) the main benefits of of variable selection were summarized to be 3-fold: improving the inference performance, providing faster and costeffective predictors, and better understanding of underlying process that generates the data. The many methods that have been proposed in the literature includes various correlation and information criteria (see Guyon and Elisseeff (2003) for a
منابع مشابه
ESTIMATING VARIABLE STRUCTURE IN MULTI-TASK LEARNING Estimating variable structure and dependence in multi-task learning via gradients
We consider the problem of learning gradients in the supervised setting where there are multiple, related tasks. Gradients provide a natural interpretation to the geometric structure of data, and can assist in problems requiring variable selection and dimension reduction. By extending this idea to the multi-task learning (MTL) environment, we present methods for simultaneously learning variable...
متن کاملLearning gradients: predictive models that infer geometry and dependence
This paper develops and discusses a modeling framework called learning gradients that allows for predictive models that simultaneously infer the geometry and statistical dependencies of the input space relevant for prediction. The geometric relations addressed in this paper hold for Euclidean spaces as well as the manifold setting. The central quantity in this framework is an estimate of the gr...
متن کاملDeveloping a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression
Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...
متن کاملLearning Gradients: Predictive Models that Infer Geometry and Statistical Dependence
The problems of dimension reduction and inference of statistical dependence are addressed by the modeling framework of learning gradients. The models we propose hold for Euclidean spaces as well as the manifold setting. The central quantity in this approach is an estimate of the gradient of the regression or classification function. Two quadratic forms are constructed from gradient estimates: t...
متن کاملLearning gradients on manifolds
A common belief in high dimensional data analysis is that data is concentrated on a low dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high dimensional data with few observations. We obtain generalization error bounds for the gradient estimates and show tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008